Proper Name Extraction from Non-Journalistic Texts
نویسندگان
چکیده
This paper discusses the influence of the corpus on the automatic identification of proper names in texts. Techniques developed for the newswire genre are generally not sufficient to deal with larger corpora containing texts that do not follow strict writing constraints (for example, e-mail messages, transcriptions of oral conversations, etc). After a brief review of the research performed on news texts, we present some of the problems involved in the analysis of two different corpora: e-mails and hand-transcribed telephone conversations. Once the sources of errors have been presented, we then describe an approach to adapt a proper name extraction system developed for newspaper texts to the analysis of e-mail
منابع مشابه
Textual Similarity based on Proper Names
Proper names represent about 10% of English or French newspaper articles. Their quantity and informational quality is already used in different Information Extraction systems. Proper names have widely been studied in the MUC conferences designed to promote research in Information Extraction. We have created our own named entity extraction tool based on a linguistic description with automata. Th...
متن کاملCompetition of Discourses in Journalistic Translation: Diplomatic Negotiations in Focus
We sought to understand whether, how, and why the translated journalistic texts related to the Iranian nuclear negotiations were manipulated. To this end, we monitored a news agency’s Webpage in a time span of 46 days that began 3 days before Almaty I nuclear talks and ended 3 days after Almaty II talks. Monitoring resulted in a corpus made up of 36 target texts p...
متن کاملMultilingual corpora with coreferential annotation of person entities
This paper presents three corpora with coreferential annotation of person entities for Portuguese, Galician and Spanish. They contain coreference links between several types of pronouns (including elliptical, possessive, indefinite, demonstrative, relative and personal clitic and non-clitic pronouns) and nominal phrases (including proper nouns). Some statistics have been computed, showing distr...
متن کاملFrom Academic to Journalistic Texts: A Qualitative Analysis of the Evaluative Language of Science
This study examined academic articles and journalistic reports in 5 disciplinary areas to explore how similar contents might attitudinally be realized in two different genres. To this end, 25 research articles and 210 news reports were carefully selected and underwent detailed discourse semantic and grammatical analyses with the purpose of identifying the evaluative linguistic patterns....
متن کاملResolución de Correferencia de Nombres de Persona para Extracción de Información Biográfica
Information extraction systems need a previous processing step in order to recognize coreferential elements, such as personal name variants. This paper has two aims: the first is to describe the main types of personal name coreference found in encyclopedic and journalistic texts in Spanish. Furthermore, we introduce an algorithm that solves most coreferential links between personal name variant...
متن کامل